Search CORE

100 research outputs found

Bandit Online Learning in Pseudo-Monotone Games with Multi-Point Pseudo-Gradient Estimate

Author: Hu Jianghai
Huang Yuanhanqing
Publication venue
Publication date: 24/07/2023
Field of study

Non-cooperative games serve as a powerful framework for capturing the interactions among self-interested players and have broad applicability in modeling a wide range of practical scenarios, ranging from power management to drug delivery. Although most existing solution algorithms assume the availability of first-order information or full knowledge of the objectives and others' action profiles, there are situations where the only accessible information at players' disposal is the realized objective function values. In this paper, we devise a bandit online learning algorithm that integrates the optimistic mirror descent scheme and multi-point pseudo-gradient estimates. We further demonstrate that the generated actual sequence of play can converge a.s. to a critical point if the game under study is merely coherent, without resorting to extra Tikhonov regularization terms or additional norm conditions. Finally, we illustrate the validity of the proposed algorithm via a Rock-Paper-Scissors game and a least square estimation game

arXiv.org e-Print Archive

A Bandit Learning Method for Continuous Games under Feedback Delays with Residual Pseudo-Gradient Estimate

Author: Hu Jianghai
Huang Yuanhanqing
Publication venue
Publication date: 28/03/2023
Field of study

Learning in multi-player games can model a large variety of practical scenarios, where each player seeks to optimize its own local objective function, which at the same time relies on the actions taken by others. Motivated by the frequent absence of first-order information such as partial gradients in solving local optimization problems and the prevalence of asynchronicity and feedback delays in multi-agent systems, we introduce a bandit learning algorithm, which integrates mirror descent, residual pseudo-gradient estimates, and the priority-based feedback utilization strategy, to contend with these challenges. We establish that for pseudo-monotone plus games, the actual sequences of play generated by the proposed algorithm converge a.s. to critical points. Compared with the existing method, the proposed algorithm yields more consistent estimates with less variation and allows for more aggressive choices of parameters. Finally, we illustrate the validity of the proposed algorithm through a thermal load management problem of building complexes

arXiv.org e-Print Archive

A Study of the Duality between Kalman Filters and LQR Problems

Author: Hu Jianghai
Lee Dong-Hwan
Publication venue: 'Purdue University (bepress)'
Publication date: 03/11/2016
Field of study

The goal of this paper is to study a connection between the finite-horizon Kalman filtering and the LQR problems for discrete-time LTI systems. Motivated from the recent duality results on the LQR problem, a Lagrangian dual relation is used to prove that the Kalman filtering problem is a Lagrange dual problem of the LQR problem

Purdue E-Pubs

A Semidefinite Programming Formulation of the LQR Problem and Its Dual

Author: Hu Jianghai
Lee Dong-Hwan
Publication venue: 'Purdue University (bepress)'
Publication date: 09/11/2016
Field of study

The goal of this paper is to derive a modified formulation of the finite-horizon LQR problem, which can be cast as semidefinite programming problems (SDPs). In addition, based on the the Lagrangian duality, its dual problem is studied. We establish connections between the proposed primal-dual conditions with existing results. As an application of the proposed results, the decentralized LQR analysis and design problems are addressed. Especially, using the structure of the derived LQR formulations, a sufficient but simple and convex surrogate problem is developed for solving decentralized LQR design problems

Purdue E-Pubs

Stabilizing Switched Linear Systems under Adversarial Switching

Author: Hu Jianghai
Lee Dong-Hwan
Shen Jinglai
Publication venue: 'Purdue University (bepress)'
Publication date: 15/09/2015
Field of study

The problem of stabilizing discrete-time switched linear control systems using continuous input by the user and against adversarial switching by an adversary is studied. It is assumed that the adversary has the advantage in that at each time it knows the user\u27s decision on the continuous control input but not vice versa. Stabilizability conditions and bounds on the fastest stabilizing rates are derived. Examples are given to illustrate the results

Purdue E-Pubs